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Voice User Interface (VUI) is an Artificial Intelligence tool that enables children 
to access a computing device and complete tasks through speech instead of 
using learning methods. VUI, a form of AI (Artificial Intelligence), takes a sound 
that children articulate in a spoken statement and use intent recognition to 
understand the action required to fulfill the child’s spoken request. The design 
and features of VUI have been developed to increase the interpersonal level of 
communication with users and, to some degree, make voice assistants behave 
like humans. The features that have been created, have been shaped in such a 
way as to improve learning efficacy and ease of use for early childhood learning 
development. The current available VUIs in the market have been geared to 
provide children with a simpler way to interact with access to educational 
technology learning tools. The research posits that there are two primary uses 
of VUI in childhood learning development exploration, whereby children use 
VUI as a form of entertainment and information seeking, and children use VUI 
to develop various knowledge facets. For children in the early language stages 
currently using language to communicate, VUI language stimulation can help 
children to engage in continuous communication processes, use and 
understand various words, and successfully complete more complex sentences. 
The research seeks to state the problems associated with VUI and the standard 
opinions based on research associated with the problem. Moreover, the study 
seeks to articulate the hypothesis that VUI is an effective tool for early childhood 
language learning through the use of peer-reviewed evidence and examples, to 
the hypothesis, to generate new and innovative perspectives. 


1. Introduction 


communication with a VUI indicates that a designer must be 


Unlike traditional communication mechanisms that 
require input and output devices [1], Voice User Interference 
(VUI) allows users to interrelate with electronic devices 
through speech. Users of VUI are able to interact with 
electronic devices by talking to them, comparable to a natural 
conversation. The primary advantage of a VUI is that it allows 
for a hands-free, eyes-free way in which users can interact 
with a product without having to hold the device. Because 
users normally associate voice with interpersonal 
communication rather than with person-technology 
interaction, they are sometimes unsure of the complexity to 
which the VUI can understand. Hence, for successful VUI 
interaction, it requires the ability to understand spoken 
language but also needs users who are aware of what type of 
voice commands they can use and what type of interactions 
they can perform. The elaborate nature of a user’s 


cognizant of how a user may potentially have high 
expectations. Hence, this is why it is important to design a 
product in sucha simple manner to keep the user mindful that 
a two-way “human” conversation is impossible [2]. Moreover, 
the user’s patience in building a communications “rapport” 
likely helps user satisfaction when the VUI becomes more 
familiar with the speaker’s voice, and thus, provides the 
speaker with more accurate responses. In this context, hands- 
free interaction with VUI devices provides significant benefits 
for children as a means of language learning. For the purposes 
of this paper, we define children to include early adolescents: 
those 12 years of age and younger. VUI’s support of children’s 
learning has been well documented, and the trends suggest 
that VUI is altering the way children relate to technology as a 
means to develop language skills. The use of technology by 
children is increasing, and according to Brody [3], these users 
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are accessing digital devices, often before they are exposed to 
books. Moreover, global trends suggest that there are 
increases in the use and younger ages of first access to 
technology [4]. In counties with high rates of connectivity, 
young people outnumber other age demographics in terms of 
overall online populations [5]. Although research suggests 
that children prefer using the internet for gaming, chatting, 
and social networking purposes [6], there is limited research 
on VUI's impact on young children’s language development 
and if there are any negative impacts of technology use. 
Having documented the promise of leveraging VUIs’ ability to 
strengthen children's language development, application 
developers have increasingly created voice-based apps 
targeted at children’s use. These apps have increasingly been 
developed as learning to engage in dialogue with children 
based on pre-designed dialogical flows and focused on 
activities with interactive speech-based content |7]. However, 
these designs are not without challenges. For example, a VUI's 
dialogic interactions are pre-defined, and the efficacy of a 
user’s interaction with the VUI is reliant on the child 
answering in a manner that is predictable by the VUIs 
designers. Thus, when a child responds in an unexpected way, 
the VUI cannot reliably provide comprehensible feedback [8]. 


2. Current available VUI in the market and features 

An inclusive design approach that facilitates the 
participation of young users, their caregivers, and local 
communities in the life cycle of a VUI project is critical for 
children’s empowerment and for responsible VUI innovation. 
If children are going to interact with VUI systems, for instance, 
by communicating by sharing their stories and emotions with 
a companion language application, their perspectives and 
preferences must be included in the design process so that the 
VUI application not only fits their language learning needs but 
also respects their rights. Hence, the involvement of children, 
their guardians, and stakeholders in the education 
community can help ensure that AI systems are fair and non- 
discriminatory [9]. Hence, to create satisfactory user 
experiences with VUI, designers need to understand how 
children naturally communicate with their voices, in addition 
to being cognizant of the fundamentals of voice interaction. In 
their book on voice interaction, Wired for Speech [10], Nass 
and Brave posit that users often relate to voice interfaces in a 
similar manner that they relate to other humans. This is due, 
in large part, because speech is innately fundamental to 
human communication. This suggests that to understand the 
user’s fundamental hopes of VUI, developers must 
understand language principles that govern human 
communication. Since VUI cannot completely meet the 
expectations of users, a natural conversation partner for a 
child, thus it becomes increasingly important to design the 
voice user interface so that it encompasses an appropriate 
amount of information and handles children’s expectations 
suitably. To amplify the benefits of VUI as a conversational 
agent for young children, it is imperative to recognize the 
cognitive abilities and specific communication needs of 
children. To maximize the market potential of VUI, developers 
must consider child development research complementing it 
with child-agent interaction research, so developers and 
educators will be better equipped to create evidence-driven 
methodologies for the improvement and evaluation of VUIs as 
children's language learning partners [11]. When creating a 
VUI, one does not begin with an existing Artificial Intelligence 
system. Initially, a robust foundation must be developed to 
process programmed dialogues. Upon this being successful, 
the VUI and user conversations must be continuously tested. 
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Therefore, the initial step is conceptual. For VUI, this idea is 
made possible through the user’s voice. Supporting such 
comprehensive ideas requires data analysis; and upon 
inferences being drawn from these analyses, VUI design can 
then commence. Throughout the design process, all aspects of 
configurations must continuously progress through testing 
until a validated functional concept occurs. Amendments are 
an integrated and essential step of AI design before, during, 
and long after development. Only when a minimal viable 
product has been established based on a series of dialogues 
can you begin with big data collection gathered from user 
insights [12]. 

Considering the effort needed to create VUI, when 
developed appropriately, VUI offers substantial opportunities 
and learning applications for children through social 
interactions. As noted; however, these tools must be 
meticulously developed and designed to meet children’s 
developmental needs, and this can be accomplished by 
implementing relevant information based on how they feel 
and act. Therefore, VUI design must naturally target the 
population it aims to serve because children’s interactions 
with VUI influence their current actions and thoughts but 
have effects on how they will intermingle with other people 
in the future [13]. As a result of the escalating market demand 
for VUIs on the international market to support young 
children's language development, developers have created 
thousands of such VUI applications available to children [7]. 
These applications engage in conversation with children 
based on pre-designed dialogical flows and communicate 
with children via specific activities with collaborative speech- 
based content [14]. Moreover, these educational resources 
are particularly treasured for preschool-aged children who 
have not yet learned to read or write and who primarily 
depend on oral communication [13]. Similar to children's 
interactions with parents or teachers, language interactions 
with VUI apps must be thoughtfully designed to actualize 
their intended educational and developmental goals for 
children [15]. Many studies suggest that VUIs’ conversation 
design should focus on human-to-human communication 
[16], but these recommendations are often not tailored for 
specific user groups [17]. Because of children's developing 
cognitive and language abilities, it is paramount that effective 
adult-child communication strategies be utilized when 
designing VUIs intended for use by young children [18]. 
However, there is limited research devoted to communication 
strategies that need to be incorporated into VUI apps for 
children. 

Developers are continuously working on algorithms to 
give VUI, social characteristics, and specific personalities. The 
idea of providing voice interfaces for children’s applications 
is not a new one; however, the scope of the systems that have 
been developed thus far has been relatively limited. Examples 
of spoken dialog system prototypes for children include word 
games for pre-schoolers [19], aids for reading, and 
pronunciation tutoring [20]. Historically, multimodal 
interfaces that combine speech with a variety of other input 
modalities such as text, touch, mouse clicks, handwriting, and 
gestures have been designed [21]. Results of these designs 
indicate that multiple modalities, rather than a single 
modality, lead to more efficient and natural interaction and 
enhance the overall user experience. Multimodality is deemed 
to be best in developing conversational interfaces for children 
becomes it has the ability to overcome speech technology 
limitations. Creating an effective user interface for children’s 
language learning VUI entails consideration of the following: 
(i) the data requirements of the task, (ii) the constraints and 


6 


J. George et al. /Future Technology 


capabilities of the voice technology, and (iii) the expectations, 
knowledge, and inclinations of the user. By understanding 
these aspects, the VUI designer can anticipate challenges and 
incongruities that may impact the overall success of the VUI 
and design the interface to mitigate their impact. For ideal 
results, user interface design must be an essential and early 
factor in the whole design of a system. User interface design 
and application are most effective as an iterative process, with 
interfaces tested analytically on groups of children users, then 
amended as shortcomings are detected and rectified, and then 
retested until system performance is balanced and adequate 
[21]. Building VUI for children is stimulating and is a process 
encompassing several steps. The first step is creating a proof 
of concept for the use of speech as a practical way for children 
to interact with VUI in terms of viability and usability. Second, 
information from child users must be collected for 
quantifying the unpredictability present in their speech and 
to teach and test models for automatic speech recognition 
(ASR) and spoken language understanding (SLU). This must 
be done to ensure satisfactory levels of ASR and SLU 
operation across all ages and conditions. Finally, insight and 
conclusions from these data analyses and modeling [22] can 
be used to produce prototype systems. 

Large vendors of commercial voice assistants offer their 
own distinct guidelines for VUI developers [23]. These 
guidelines offer support for developing applications for 
specific platforms. In this context of platform-independent 
options, models, and design tools, presented a set of design 
principles for the VUI applications, taking a role as a faithful 
servant, while [24] analyzed and modeled users’ behavior 
patterns in interaction with unfamiliar VUIs. Researchers 
have built several tools in support of VUI design [25], SUEDE- 
enabled Wizard-of-Oz style prototyping of VUIs. SPICE and 
SToNE are toolkits for helping developers and researchers 
design speech recognizers for VUI applications [21]. To assist 
designers to modify the integrated voice in more useful and 
cost-effective ways, Amazon and IBM created their own 
innovative SSML (Speech Synthesis Markup Language) tags 
that contain the effects of various primitive standard SSML 
tags [26]. 

To design effective VUI, developers must enhance 
mechanisms to provide children with missing information 
about what they can do and how they can do it without 
confusing them. Hence, developers are responsible for 
measuring the expectations users have from their experience 
with daily and routine conversations. Considering that human 
communication is context-bound, however, in voice 
interaction, child users must be taught how to express their 
needs in a manner that the VUI can comprehend. Moreover, 
developers can impact the ease of use by providing 
information about what child users can do and what 
functionality they are using, informing them how to 
communicate their goals in a way that the system 
understands, keeping sentences short, and offering visual 
feedback so they know if the VUI is comprehending their 
intentions. VUI presents additional challenges in some 
regards than a graphically based system; however, VUI is 
becoming more predominant as more aspects of everyday life 
feature voice-controlled interaction [2]. 


3. Importance of speech and language stimulation in 
children 
Language is the ability to communicate with others. 
Languages include all forms of communication, expressed in 
multiple ways, such as oral, written, sign language, gestures, 
facial expressions, or art. Spoken language is the most 
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valuable form of communication and the most significant and 
commonly used [27]. Language learning is a consequence of 
the collaboration between a children’s learning capabilities 
and the language setting [28]. General language stimulation 
approaches include modifications of the physical and 
linguistic situations to expand opportunities for children to 
hear the developmentally suitable language and to use 
language built on their abilities. General language stimulation 
does not focus on specific language types or communication 
actions, and the intervention agent never tells the child 
directly to create any particular words, word patterns, or 
grammatical structures. Instead, the intervention 
concentrates on establishing a rich language atmosphere that 
is designed to the child’s concerns and talents. Children may 
then concentrate on those facets of language that they are 
mainly ready to learn [29]. Some researchers imply that VUI- 
led communication is a collective interaction related to 
interpersonal communication, with the VUI taking the role of 
a partner in children's language learning [30]. However, there 
are questions remaining regarding the effectiveness of VUI in 
assisting in children’s language development. 


4. VUľs contribution to childrens language 

development 

The sociocultural theory defines language development 
as a process where children learn language skills through 
cooperative dialogue with members of society in daily 
activities [31]. Through back-and-forth conversations with 
knowledgeable language partners who offer to scaffold and 
facilitate active participation, children gain knowledge by 
concentrating attention, expressing thoughts, and reflecting 
on the discussed topics [32]. Language development is a 
primary indicator of the comprehensive development of 
children's cognitive abilities related to success in school [33]. 
In the beginning, children's language was egocentric, that is, a 
form of language that emphasizes itself more. Then it 
gradually develops into a social language, which is used to 
relate or exchange ideas and influence others. In this case, the 
form of language used is often in the form of complaints, bad 
comments, criticisms, and questions. When a child's language 
changes from egocentric to social language, the union 
between language and thought is essential for the formation 
of the child's mental or cognitive structure. In the first years 
of life, language must be learned as a way of communication 
and a way to enter into a community and society. Children 
have a yearning to belong and to effectively communicate 
their needs and interests. To accomplish this, they must 
master the skills to communicate with others, which entails 
expressing themselves and understanding others. Initially, 
they accomplish this via expressions, sounds, touches, and 
body movements. Children progressively develop more 
particular means to express themselves, such as nonverbal 
gestures and facial expressions, tone of voice and sounds, and 
verbal words and sentences. However, the distinction 
between these modes is primarily analytical: Children need to 
express themselves, and they make use of all available means 
to accomplish that. For young children, language is just one 
instrument in the ensemble of all means of expression 
Language is not all-encompassing but instead provides 
higher-level goals and the learning of specific communication 
tasks. Within a child’s educational context, language 
achievement is critical and has lifelong consequences. Early 
language abilities and the quality of early education and care 
settings have been proven to be linked with successful school 
performance in older children and adolescents. Inadequate 
language competencies hamper the achievement of cognitive, 
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emotional, and social abilities. Moreover, early language 
education helps in the initial integration of children in 
successful language retention in that already prior to entering 
the school system, the children become familiar with learning 
opportunities in their social environment, the neighborhood, 
the community, and the educational system (schools), and in 
that they have social contact with and play with other 
children and can develop the language of their social 
environments in conversations with caregivers and other 
children. Early language attainment can be hindered by young 
children’s individual characteristics, for example, physical 
limitations, conditions in their social environment, for 
example, poverty, lack of parental involvement, negative 
media utilization, or a mixture of personal and social factors. 
Thus, it is paramount to support children’s early language 
acquisition in all of their contexts and to solely concentrate on 
children’s or parents’ language shortfalls. Promoting 
language development should entail building the capacity of 
children and parents to use accessible resources to enhance 
children’s language development. This resources-based 
strategy supports the building of trusting educational 
partnerships, including information technology and more 
specifically, VUI to advance children’s educational 
competencies. 

Language is a cultural implement and is developed in 
social interaction [34], therefore environmental context plays 
an integral important role in language acquisition. Heath [35], 
suggested that the following contexts be considered: Under 
what spatial-material conditions do children grow up? What 
caregivers- parents, siblings, grandparents, and other familiar 
persons- do they have at hand in their daily lives? What 
languages are they exposed to? How do people in their 
surroundings communicate, play, teach, and learn? What 
media do the children have access to, and how are they used? 
How does the family spend their everyday time and their 
leisure time? The economic, social, and cultural capital of 
families creates very different conditions for language 
acquisition [35]. Individual circumstances, it is not so many 
structural factors, such as parents’ educational background or 
socioeconomic conditions, that are most significant; but what 
is pivotal are the definitive language and education traditions 
in daily family life [36]. For example, there is a constructive 
association between children’s language abilities and the 
accessibility of age-suitable books in the home, and the 
occurrence and linguistic intricacy of language exchanges 
[37]. As has been noted, early learning is important in 
children’s language progress in receptive and beneficial 
linguistic abilities. Language acquisition through experience 
is what happens during early childhood, where the language 
is ingrained into the child’s mind subconsciously [38]. In this 
regard, in contemporary society, the use of technology is an 
important factor that impacts the language development of 
children. Pauwels [39] indicated that technology is often a 
part of children’s everyday environment and its impact and 
effect on language is unquestionably meaningful. As a result 
of the promising development of artificial intelligence, 
children are increasingly interacting with non-human 
intelligent agents through speech, gesture, or writing. VUI 
that supports natural speech interaction is especially valuable 
for young children, whose lack of language literacy causes 
difficulty in figuring out digital environments [15]. Research 
suggests that in some cases of language development, 
obstacles occur as a result of VUI because children spend 
more time interacting with the gadget than talking to their 
peers and interacting with humans. However, when used in 
the proper context the use of VUI provides excellent 
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stimulation that can be used for increased language 
development. These points are further alluded to by [40], who 
suggest that VUI impacts early childhood language 
development. 


5. Various uses of VUI in children’s learning 

The importance of VUI has been proven to provide 
support for educators and caregivers in home and classroom 
supervision, all the while availing opportunities for voice- 
driven learning with dialogue-driven interactions with 
numerous and singular turn-takings; opportunities to 
enhance fluency, as well as active (speaking) and passive 
(listening) competencies; access to a range of actions or skills 
involving knowledge seeking behaviors; one-on-one 
individualized language learning and language practice 
support [41]; and instant access to subject matter that is 
accurate and objective. Working with VUI in the language 
development setting involves developing significant speaking 
opportunities incorporated in a manner that gives children 
the tools to use that language in the future [42]. This is 
particularly pertinent because children are now living with 
Artificial Intelligence and VUI as part of their daily lives, and 
many children are using voice-assisted technologies 
primarily for data searches; engaging in questions and 
answers, and entertainment [43]. However, students need to 
be empowered with the skills to know how to evaluate this 
information and decide the best manner to make it relevant 
to their requirements, resolve challenges, for accomplishing 
specific responsibilities, or for attaining specific conclusions. 
A study by Sowmya et al. [44] implies that children with 
frequent gadgets, including VUI usage scored higher on 
language development tests than children with low gadget 
usage. Thus, it can be understood that gadgets and VUI use 
generate an encouraging impact on children’s language 
development. According to UNICEF, as the influence of VUI 
and gadgets grow, children broaden their knowledge base, 
thus it is important that VUI innovation is triggered by 
children's developmental needs [45]. Distinct VUI for children 
has now also demonstrated how VUI, an AI technology, could 
influence children's development in a positive manner [46]. 
The fast creation of VUI is redefining language partnerships 
mean. Language partners are no longer restricted to humans 
but also extended to VUIs with agents that are created to 
understand complex speech input [15]. According to 
Tomasello [47], countless children now cooperate regularly 
with VUI in their own homes, and researchers see this child- 
agent conversation as a meaningful addition to children's 
daily language encounters. A promising wealth of research 
using interviews, observations, and in-home audio recordings 
has illustrated two types of exchanges children commonly 
have with VUIs: open-domain conversations with general 
assistant tools [48]. In  agent-led context-specific 
conversation, the VUI leads children down an earlier- 
designed dialogic conversation journey on a specific topic 
[49]. Several research projects have created experimental 
VUIs that combine helpful guided conversation approaches as 
found in the traditional literature, specifically, the prompt- 
response-scaffolding cycle [14]. Prompt-response-scaffolding 
involves the use of written or voice prompts or cues to assist 
children to perform a task or use a strategy, and children can 
use these as a reference to reduce confusion and frustration 
[50]. For example, one study developed a storytelling app that 
asks children open-ended questions, provides responses, and 
follows up on children's incorrect responses with helpful 
hints [14]. Effective conversation design is significantly 
important when the objective entails agent-led interactions 
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with young children's language development. This is 
especially important because young children are 
continuously 1 developing their cognitive aptitudes, 
communicative abilities, and mental representation to 
interact with a digital speaker [51]. Conversely, because a 
VUI's interpretation of what a child is attempting to express is 
centered primarily on the predesigned dialogic roadmap, 
VUIs are not compatible with adjusting the conversation ebbs 
and flows as naturally as a human language partner [52]. 
These two factors lead to child-agent conversations being 
susceptible to failure. Many studies detailing such 
conversation interruptions have concentrated on how 
children struggle to adapt their communication strategies to 
prevent possible conversation breakdowns [52]. 

Hill et al. [15], in their study of VUI for young children, 
found three reasons for their engagement: exploration for 
enjoyment, information-seeking, and as a way of operating a 
specific device. In this context, Winkler et al. [54] developed 
Zhorai, a Conversational Agent (CA) that supports children’s 
exploration of AI algorithms and machine learning. Lin et al. 
also revealed that by training an agent, examining its 
mistakes, and reorienting the agent, children could appreciate 
the agent’s ability to learn and recognize the learning 
algorithms used by it. Researchers have shown awareness of 
using CAs, as well as social robots, as a positive intervention 
for children with special needs [54]. One such example is 
PunkBuddy, a tool that has a chatbot that assists dyslexic 
children to learn through interaction. The chatbot informs 
children on the rules of using punctuation, using clear 
instructions [55]. Xu and Warschauer [56] created a VUI for 
children with ADHD to help with their daily tasks. The VUI 
provides vocal feedback to the child and urges them to 
complete the task, and equally, the child provides feedback to 
the VUI about their progress. Moreover, Wu et al. [57] 
developed a chatbot for children with autistic spectrum 
disorder (ASD) to enhance their ability to hold a conversation. 
Their chatbot stimulates the curiosity of children and tries to 
assist them in better understanding conversations. Social- 
assistance CAs are frequently used to assist children and 
adults with special needs, especially children with ASD [58], 
and some researchers have suggested that a child with ASD 
could find it simpler to relate with a social robot than with a 
human educator or caregiver [57]. Additionally, Ziyad [58] 
developed a social robot to improve the social- 
communication skills of children with ASD. The robot can 
move or talk based on a selected assignment defined by the 
caregiver. The researchers indicated that after a one-month 
deployment, the children with ASD improved their behavior 
and increased their independence levels. Moreover, [36] 
developed QTrobot, a social robot to assist children with ASD 
to focus their minds, emulate positive conduct, and decrease 
monotonous behaviors. 


6. Conclusion and future perspective 

Recognizing the potential benefits and identified 
challenges, global and nationally contextualized VUI 
strategies have now begun to concentrate on mechanisms to 
improve the delivery of educational services to improve 
young children’s language development [34]. As noted, VUI- 
based interactive games, chatbots, and robots have presented 
innovative platforms for children to communicate with others 
and think creatively, which are significant skill sets that are 
necessary for the digital age in which children are being 
raised [22]. It must be stated that VUI if it intends to gain 
educational significance, needs to embrace more learning 
tools beyond being question-and-answer gadgets [9]. As 
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innovation occurs and technology develops, society, 
educators, and caregivers must embrace the opportunity for 
these stakeholders in children’s development to facilitate 
learning and supplement it with VUI technology to enhance 
children’s imagination and encourage active language 
acquisition. Going forward, VUI requires an inclusive design 
method that incorporates the participation of children, 
caregivers, and local communities in the life cycle of VUI 
projects that support language development is essential for 
children’s empowerment and for responsible VUI innovation. 
If children are going to interact with VUI in their language 
development, their viewpoints and needs should be 
incorporated into the design process so that the VUI 
application not only suits their needs but also respect their 
rights as children. Finally, the inclusion of children, 
caregivers, and other relevant stakeholders can assist in 
guaranteeing that VUI systems are fair and non- 
discriminatory. 
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